R Coding Club
RTG 2660
2024-01-16
What is a coding club?
“Coding Club is for everyone, regardless of their career stage or current level of knowledge. Coding Club is a place that brings people together, regardless of their gender or background. We all have the right to learn, and we believe learning is more fun and efficient when we help each other along the way.” (https://ourcodingclub.github.io/)
Doing statistical calculation by hand? Tedious & error prone! Computer is faster…
Using spreadsheets? Limited options, change data accidentally…
Using point-and-click software (e.g. SPSS)?
proprietary software = expensive
R = open, extensible (community)
reproducible!
You’ll learn to program!
You should all have installed both by now! Who had problems doing so?
RStudio Interface
Script pane/window -> to save your code
Console -> here the commands are run
Environment -> which variables/dataframes are saved
Files, plots, help etc. -> files shows you the files in the folder you’re currently in
Console used as calculator
“<-” is used to assign values to variables (“=” is also possible but not preferred)
a, multi etc. are the variable names, which can be words, no whitespace allowed
as you can see, the variables can contain different types: Numbers, strings/characters (= words) etc.
no output in console!
the variables contain the calculated value (i.e. 101) and not the calculation/formula (100+1)
You can use those variables for further calculations, e.g. a + multi
It makes sense to save all your scripts etc. in a folder specifically dedicated to this course.
Make sure that R knows that you want to work in this folder, i.e. set your working directory:
Assignment: Please make a folder, e.g. called “R_Club” (but not “R” or anything with spaces in it). Then set your working directory to this folder.
If you type in all your commands/code in the console, it might get lost/you might not remember what you did, and you always have to type it in again if you want to run it again with slight changes. Also, the code in the console is not save-able.
Therefore, it is better practice to write scripts. Scripts are basically text files that contain your code.
To open a new script, click File -> New File -> R Script.
To run a line of the script, you can either click Run at the top right of the pane or press ctrl + enter. It will always run the line where the cursor is located (or the lines that you have selected with the mouse). To run the whole script, press ctrl + shift + enter.
Assignment: Open a new file. In this file, write down some of the code (one command per line) that we have used so far and save the file.
Now run the code (either by pressing “run” at the top right of the script or ctrl + enter).
You might have noticed sqrt(9) earlier. sqrt() is an R function that calculates the square root of a number. 9 is the argument that we hand over to the function.
If you want to know what a function does, which arguments it takes, or which output it generates, you can type ?functionname() in the console, e.g.
This will open the help file in the Help Pane on the lower right of RStudio.
Functions often take more than one argument:
You can explicitly state which argument you are handing over (check the help file for the argument names!) or just state the values (but these have to be in the correct order then! See help file).
There are a number of functions that are already included with Base R, but you can greatly extend the power of R by loading packages. Packages are like libraries of functions that someone else wrote.
You can load a package using the install.packages() function:
(It may be necessary to install Rtools: https://cran.r-project.org/bin/windows/Rtools/)
But installing is not enough to be able to actually use the functions from that package. You’d also need to load the package with the libary() function:
Assignment: Install and load the tidyverse package (which we will use a lot in this course).
To read in data files, you need to know which format these files have, e.g. .txt. or .csv files or some other (proprietary) format. There are packages that enable you to read in data of different formats.
We will use the files from Fundamentals of Quantitative Analysis. Save these in your course folder on your computer (do not open them!). Set your working directory to the course folder.
Delete the text/code in the .Rmd document you just worked on (or add a new header like “Working with data”). Underneath, add a code chunk with the following content:
Run the code chunk!
There are several options to get a glimpse at the data:
Click on the object/variable name in your Environment.
Type View(NameOfObject) in your console, e.g. View(dat).
In the console, type in str(dat) or str(pinfo) to get an overview of the data.
In the console, type in summary(dat).
In the console, type in head(dat).
What is the difference between these commands?
What is the difference to the objects/variables, that you assigned/saved in your Environment earlier and these objects?
The two objects we just read in are data frames, which consists of full datasets. The objects we assigned earlier were simpler variables, which only consisted of single values/words.
Data frames usually have several rows and columns. Remember, the columns are the variables and the rows are the observations.
R Markdown
Quarto
Projects
R scripts are a good way to save your code. However, you’d better heavily comment in your scripts, so that future you (and potentially collaborators) know what happens in your script.
An alternative is an R Markdown file. This is also a sort of script, but you can write text (like in a word processor) and mix it with code chunks, where you can write your R code. R Markdown is the “language” you use to write in these files, which is a variety of Markdown.
The advantage of R Markdown files (ending with .Rmd) is that they increase reproducibility. For example, you can write whole reports in R Markdown (and also these slides are made with it!).
A newer variant is called quarto, which works very similar (but is more flexible) to R Markdown.
Assignment:
Open a new .Rmd file, change/insert the title and author.
Check out the content of it.
Delete and add some of the text on the white background. Change the Header (indicated by ##) to “About me” and write something about yourself underneath.
Switch between “Source” and “Visual” in the top left. What changes? What is “Visual”?
In the grey boxes (“code chunks”), add some code. Try to find out how you can add a new code chunk.
Save the file with a sensible name.
What happens when you click on “Knit” (top of Script pane)?
insert inline code
There are many useful things you can so with R Markdown: Adding different headers, adding inline code, knitting as a PDF, adding pictures or tables…You can also decide whether the code chunks should be visible in the output etc.
For further information, check out the R Markdown cheatsheet: https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
That’s the lesson on “Getting started with R”!
Next week, we’ll talk about models & probability and learn how to wrangle (= preprocess) data in R!